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Abstract 

Background: Bacteraemia is a frequent and severe condition with a liigh mortality rate. Despite profound knowledge about 
the pre-test probability of bacteraemia, blood culture analysis often results in low rates of pathogen detection and therefore 
increasing diagnostic costs. To improve the cost-effectiveness of blood culture sampling, we computed a risk prediction 
model based on highly standardizable variables, with the ultimate goal to identify via an automated decision support tool 
patients with very low risk for bacteraemia. 

Methods: In this retrospective hospital-wide cohort study evaluating 15,985 patients with suspected bacteraemia, 51 
variables were assessed for their diagnostic potency. A derivation cohort {n = 14.699) was used for feature and model 
selection as well as for cut-off specification. Models were established using the A2DE classifier, a supervised Bayesian 
classifier. Two internally validated models were further evaluated by a validation cohort {n = 1,286). 

Results: The proportion of neutrophile leukocytes in differential blood count was the best individual variable to predict 
bacteraemia (ROC-AUC: 0.694). Applying the A2DE classifier, two models, model 1 (20 variables) and model 2 (10 variables) 
were established with an area under the receiver operating characteristic curve (ROC-AUC) of 0.767 and 0.759, respectively. 
In the validation cohort, ROC-AUCs of 0.800 and 0.786 were achieved. Using predefined cut-off points, 16% and 12% of 
patients were allocated to the low risk group with a negative predictive value of more than 98.8%. 

Conclusion: App\y'mg the proposed models, more than ten percent of patients with suspected blood stream infection were 
identified having minimal risk for bacteraemia. Based on these data the application of this model as an automated decision 
support tool for physicians is conceivable leading to a potential increase in the cost-effectiveness of blood culture sampling. 
External prospective validation of the model's generalizability is needed for further appreciation of the usefulness of this 
tool. 
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Background 

Bacteraemia is a frequent and severe condition with an 
annualized incidence of 122 per 100.000 people. The mortality 
rate ranges between 14% and 37% [1-3]. Risk factors for 
bacteraemia are advanced patient's age, urinary or indwelling 
vascular catheter, fulftlment of two or more SIRS criteria, 
impaired renal or liver function, malignancy or other chronic 
co-morbidities [4—8] . Although blood culture analysis is considered 
the gold standard for diagnosing bacteraemia in patients with 
suspected blood stream infection, the clinical decision of when to 
take a blood culture is not trivial. Despite profound knowledge 



about the pre-test probability of positive blood culture results, 
which is strongly influenced by the site of infection, true positive 
rates identifying a causative pathogen are in a low range when 
consecutively assessed (4.1%-7%) [9-11]. Compared to the true 
positive rate, false positive results due to contamination are in a 
similar or even in a higher range, varying between 0.6% to over 
8% [11-13]. Importantiy, these imperfections of blood culture 
analysis have an important economic impact, resulting in a 20% 
increase of total hospital costs for patients with false positive blood 
cultures [14—17]. Economic analyses estimate the costs related to a 
single false positive blood culture result between $6,878 and $7,502 
per case [17-19]. 
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To increase the cost effectiveness of blood culture analysis, the 
identification of targeted patient cohorts is therefore highly 
needed. Several prediction systems for bacteraemia in special 
patient cohorts have been published with ROC-AUCs in a 
moderate range [20-24]. However, physicians are arguably 
inefficient in applying a multitude of available prediction scores 
for specific conditions and specific patient cohorts [25,26]. The 
aim of the current study was therefore to establish a machine 
learning based prediction system for inpatients and outpatients 
with suspected bacteraemia using highly standardized and 
routinely available laboratory parameters to identify those patients 
for whom blood culture sampling may safely be omitted due to 
very low pre-test probability for bacteraemia. 

Material and Methods 

Study Design and Data Collection 

The current study was designed as a retrospective cohort study, 
including inpatients and outpatients at the Vienna General 
Hospital, Austria, a 2,116-bed tertiary teaching facility. Between 
January 2006 and December 2010, patients with the clinical 
suspicion to suffer from bacteraemia were included if blood culture 
analysis was requested by the responsible physician and blood was 
sampled for assessment of haematology and biochemistry. Patients 
younger than 18 years and patients with unavailable laboratory 
parameter results were excluded. Patients with a potential blood 
culture contaminant and those with missing or inaccurate 
identification to the species level were excluded from further 
analysis. Blood culture contamination was defined according to 
the criteria of Hall and Lyman [27]. Furthermore, patients with 
rare blood culture isolates (less than 0.15% frequency of positives) 
were also excluded. Patients'age, gender and 49 laboratory 
parameters (see table 1) were used in the analysis. All laboratory 
parameters had been assessed in accordance to parameter specific 
SOPs at the Clinical Department of Laboratory Medicine, 
Medical University Vienna, an ISO 9001:2008 certified and 
ISO 15189:2008 accrecUted facihty. Anonymous raw data can be 



Eligible patients: 23,765 





Patients less than 18 years: 3,879 






Unavailability of parameters: 3,389' 






Blood culture results not assignable: 464^ 






Rare blood culture isolates: 48' 



Selected patients: 15,985 





Derivaticn cohort 14,691' 








Validation cohort: 1,294^ 





Figure 1. Selection process of tKie study population, ^unavail- 
ability of laboratory variables, ^Contaminations or fungal growth, 
^blood culture results with less than 0.001% frequency, '^study patients 
treated between Jan 1, 2006 and Jul 31, 2010, ^study patients treated 
between Aug 1, 2010 and Dec 31, 2010. 
doi:1 0.1 371/journal.pone.01 06765.g001 



request by contacting the corresponding author. Following 
national regulations each request will be evaluated for approval 
by the local human data safety commission. 

Ethical Considerations 

The study was approved by the local Ethics Committee of the 
Medical University Vienna (EC-Nr.: 333/201 1) and conducted in 
accordance to the Declaration of Helsinki (1965, including current 
revisions), the rules of Good Clinical Practice (GCP, European 
Union) and the standards for the reporting of diagnostic accuracy 
studies (STARD). Since a retrospective study design was applied, 
informed consent was not sought from study participants. To 
assure anonymity, every study participant was assigned a 
consecutive identification number, which was exclusively used 
for further analysis. 

Evaluation method 

The data set was divided into a derivation set (Jan 1, 2006 to Jul 
3 1 , 20 1 0) and a validation set (Aug 1 , 20 1 0 to Dec 3 1 , 20 1 0) based 
on the date of inclusion. For feature selection and model training 
the derivation set was used. Feature selection and internal 
validation of the trained model was performed using a 10 fold 
cross validation scheme. Results of the internal validation were 
taken to set cut-off points for risk stratification of the study 
population. The Youden index method was applied to set optimal 
cut-oflF points [28,29]. Using likelihood ratios (LR; LR":0.12, 
LR^:4.93, see figure SI) of corresponding cut-off values, three 
strata were established to group the patients into a low risk, 
intermediate risk and high risk group. For the low risk group a cut- 
off point for the classification probability was set to yield 1 % post- 
test probability for bacteraemia. For the high risk group, a cut-off 
point resulting in more than 30% post-test probability was 
predefined. Classification probabilities between these defmed cut 
off points were allocated to the intermediate risk group. To 
externally validate the discriminatory potency of the previously 
trained algorithm and risk strata, the validation set was used. 

Statistical Analysis 

For statistical analysis, WEKA (Version 3.7.10, GNU General 
Public License) and R (Version 3.0.2, GNU General Public 
License) were used [30]. Descriptive statistics of all variables 
indicated are given as median and interquartile range. For single 
variable analysis, the Mann- Whitney U-test, Pearson's chi-squared 
test and area under the receiver operating characteristic curve 
(ROC-AUC) analysis of individual variables were applied [31]. To 
train the multivariable models, variables with a high discriminative 
power were selected, using the wrapper subset evaluator algorithm 
and the correlation feature selection (CFS) subset evaluator of 
WEKA. The wrapper approach aims at selecting a relevant set of 
variables for a specific classification algorithm (in our case the 
A2DE algorithm, see below) [32]. The CFS subset evaluator 
evaluates the discriminatory power of a variable subset with 
respect to their inter-correlation to each other [33]. Furthermore, 
the effect of each variable was evaluated by a step-wise deletion of 
variables in the order of their individual Pearson's correlation 
coefficient with respect to the outcome. 

For statistical modelling, several major groups of supervised 
machine learning algorithms were applied, including Bayesian 
classifiers such as Naive Bayes, artificial neural networks such as 
multilayer perceptrons, or support vector machines. The best 
results were consistently achieved with the averaged 2-dependence 
estimators (A2DE) algorithm. The A2DE, belonging to the 
averaging n-dependence estimator classifier group, is a semi- 
Naive Bayes method [34] . This group of algorithms assumes that 
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ROC-AUCs 





1 — I — \ — I — \ — \ — \ — I — I — I — \ — \ — 1 — 1 — I — I — 1 — \ — \ — r 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
Number of parameters 



Figure 2. ROC-AUCs assessed in relation to the number of variables used. Variables are ranl<ed according to their individual correlation 
coefficient with respect to the outcome; significant decrease of the ROC-AUC is seen when more than one variable is deleted. 
doi:10.1371/journal.pone.0106765.g002 



each predicting variable depends on the outcome-class and n other 
variables. In case of the A2DE classifier, n equals two, whereas the 
classic Naive Bayes algorithm is a zero-dependence estimator, 
assuming that ail variables are conditionally independent from 
each other [35,36]. In many real-world applications, this 
independence assumption is violated, leading to inadequate 
results. The Naive Bayes algorithm requires a two dimensional 
table (outcome class and predicting variable) for indexing the 
probabilit}' estimates. In contrast, the A2DE requires two 
additional dimensions for the estimation of the two additional 
variable dependencies. Further, these classifiers aggregate the 
predictions made by a collection of w-dependence estimators [37]. 
These procedures decrease the bias but slightly increase the 
model's variance [38]. However, comprehensive experimental 
evaluations indicate that the A2DE's trade-off between bias and 
variance results in a good predictive accuracy for many 
applications and data sets [39-41]. 

For ROC-curve comparison, a paired t-test (comparison of 
paired cross validation folds), the DeLong test or the Hanely and 
McNeil comparison test were applied to values of the ROC-AUC 
[42-44]. Furthermore, 95% confidence intervals of performance 
measures, including sensitivity, specificity, negative predictive 
value (NPV) or positive £redicti\'e value (PPV), were calculated 
with bootstrapping (2,000 iterations) [45] . Where appropriate, the 
Bonferroni-Holm method was used to control for type I errors, 
related to multiple testing. Statistical significance was defined as a 
p-value less than 0.05. 



Results 

Study population 

Between January 2006 and December 2010, blood culture 
analysis was requested for 23,765 patients. Figure 1 presents the 
selection process of patients. Patients less than 18 years old 
(n = 3,879), patients with unavailable laboratory parameter results 
(n = 3,389), patients with blood culture contamination, patients 
with blood culture results having missing or inaccurate identifica- 
tion to the species level and fungal growth (n = 464) and patients 
with rare blood culture isolates (n = 48) were excluded from 
analysis. The final study population consisted of 15,985 patients. 
Among them, 1,286 patients (8%) had a positive blood culture 
result. Most prevalent bacteria were E. coli (n = 406, 31.5%), S. 
aureus (n = 297, 23. 1 %), and K. pneumonie (n = 83, 6.5%). Patient 
characteristics are presented in Table 1. According to a predefined 
temporal criterion (cut-ofiF date: Aug 1, 2010), the data set was 
divided into a derivation set (n= 14,691, 8% bacteraemia) and a 
validation set (n= 1,294, 8.2% bacteraemia). 

Feature selection and model training 

Among 5 1 available variables in the derivation set, 40 \ ariables 
resulted in a statistically significant difference between bacteraemia 
and non-bacteraemia patients. The best individual discriminatory 
variable was the proportion of neutrophil leukocytes in differential 
blood count (p<0.0001) with an ROC-AUC of 0.694 (CI: 0.686- 
0.702). At the Youden Index cut-off point, the relative amount of 
neutrophils resulted in 61.95% (59.1%-64.7%) sensitivity and 
67.6% specificity (66.8%-68.4%), respectively. Among all vari- 
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Model 2 
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1.294 (100%) 



low risk: 
157 (12%) 
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negative: 
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99 (8%) 



I 



False 






False 


True 


negative: 


966 


72 


positive: 


positive: 


3 






68 


31 



Figure 3. Graphical result of the validation cohort, model 1: 16% low risk cohort with 2 false negative patients; model 2: 12% low risk cohort 

with 3 false negative patients. 

doi:1 0.1 371 /journal.pone.01 06765.g003 



ables, 20 variables were selected by the wrapper approach (model 
1), which were forther evaluated by the CFS subset evaluator 
(model 2). Finally, model 2 consisted of ten variables, including 
patient's age, proportion of neutrophils, monocytes (absolute and 
relative value), eosinophils (absolute value), lymphocytes (absolute 
value), sodium, C-reactive protein, creatinine and total bilirubin 
(Table 2). Also other feature selection steps were evaluated, 
resulting in models with lower ROC-AUCs than described below. 

A number of applicable classes of supervised machine learning 
techniques including artificial neural networks and support vector 
machines were screened in the model selection process. Figure S2 
presents ROC-curves of various classifiers. The best results in 
ROC curve analysis were achieved by applying the A2DE 
classifier yielding an ROC-AUC of 0.767 (CI: 0.754-0.781) in 
model 1, and of 0.759 (CI: 0.745-0.773) in model 2, respectively. 
This classifier is conceptually simpler than other algorithms 
available, and presented constantly better results in ROC-AUC 
analysis than other classifier tested. Generally, the models'calibra- 
tion appears to be good. Calibration plots are shown in figure S3. 
Model 1 shows a modest risk for overestimation for patients at 
higher bacteraemia risk. This overestimation effect is not seen in 
model 2, which therefore appears to be very well calibrated. 

Using the Youden Index method to set an optimal cut-off point, 
model 1 yielded 72.1% sensitivity and 70.3% specificity with 
17.3% PPV and 96.7% NPV. Model 2 yielded 67.7% sensitivity 
and 72.8% specificity with 17.8% PPV and 96.7% NPV. Different 
cut-off points were used to establish a low risk, an intermediate risk 
and a high risk group for bacteraemia. Table 3 summarizes 
diagnostic prediction measures when using different cut-off points. 



Importantiy, the low risk group demonstrates a NPV of 98.84 
(model 1) and 99.14 (model 2), respectively. 

Effects of feature reduction and missing values 

To estimate the effect of omitting variables with low predictive 
power, variables of model 1 were ranked according to their 
individual Pearson correlation coefficient against the outcome 
variable and deleted step by step in that order. The majority of 
deletion steps led to a significant decrease of the ROC-AUC. 
Figure 2 summarizes this deletion procedure. 

Due to its retrospective study design, some variables were not 
available for all patients (Table 2). For most variables less than 
10% missing values were observed with the exception of 
cholesterol (34% missing values), amylase (27%), creatinine kinases 
(14%) and magnesium (13%). When replacing missing values with 
the mean value of the corresponding group ("value imputation"), 
no significant difference in ROC-AUCs were detected (model 1: 
ROC-AUC = 0.77, p = 0.85; model2: ROC-AUC = 0.76, 
p = 0.09). 

Validation set 

To test the generalizability of the established models, a 
validation set (n= 1,294) was used. Model 1 achieves an ROC- 
AUC of 0.80 (CI: 0.76-0.84, see figure S4). Model 2 yields an 
ROC-AUC of 0.79 (CI: 0.74-0.83). No significant differences 
were found between ROC-AUCs derived from the validation set 
and the corresponding ROC-AUCs derived from the derivation 
set (model 1: p = 0.1542, model 2: p = 0.2594). 

When applying the cut-offs point predefined by the Youden 
index method in the derivation cohort, model 1 yields a sensitivity 
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of 79.3% and a specificity of 68.4% with 18.4% PPV and 97.4%, 
NPV. Model 2 achieved a sensitivity of 80.2% and a specificity of 
70.0% with 19.3% PPV and 97.5% NPV. Using the predefined 
cut-off points for the risk model, 16% of the patients (n = 202) were 
allocated to the low risk group and 7% (n = 89) to the high risk 
group, respectively. Among the patients in the low risk group, only 
2 patients were false negatives. Similarly, applying model 2, 157 
patients (12%) were allocated to the low risk group with 3 false 
negatives. Details of the risk model are provided in table 2 while 
figure 3 represents a tree-based graphical representation of the 
prediction outcome. 

Discussion 

The goal of the current study was to assess the discriminatory 
power of machine learning models with frequently requested 
variables for predicting negative blood culture results in inpatients 
and outpatients with a suspicion to suffer from bacteraemia. The 
cost effectiveness of blood culture analysis very much depends on 
the diagnostic yield and therefore an automated tool improving 
the selection of patients may therefore increase cost-efiectiveness. 
Several scoring systems predicting the probability of a positive 
blood culture result in a specific patient cohort have been 
published previously [20,21,46-48]. However, since these scores 
necessitate the manual calculation by the physician, these are often 
not apphed. Our approach was to compute a potentially 
automated decision support tool to improve the cost-effectiveness 
of blood culture sampling using highly standardized data resulting 
in ROC-AUCs between 0.759 and 0.804. Based on these models 
tiie NPV was 99.01% for model 1 and 98.1% for model 2 for 
patients of low risk for bacteraemia. Based on these results the 
proposed support tool would be able to safely reduce 12—16% of 
blood culture sampling leading to a reduction of costs. 

In this study, statistical analysis was restricted to laboratory 
parameters as well as gender and patient's age, which are all 
readily available and highly standardized. These variables 
combine the advantage of reproducibility and availability as 
opposed to most clinical variables. 

Pre-test probability of bacteraemia may vary considerably 
between studies potentially impacting on the diagnostic accuracy 
of prediction models [10,1 1]. Our results are similar to those of a 
previous study by Piftenmeyer et al. reporting a 8.2%i prevalence 
of bacteraemia [49] . Nakamura et. al. published a hospital based 
study with a 19.5% prevalence of bacteraemia and pr(-di<:ting 
bacteraemia with an ROC-AUC of 0.73 [47]. The prevalence of 
bacteraemia (19.5%) in this study is higher than generally reported 
for hospital-based studies and may therefore lack generalizabihty 
[10,11]. Finally, Jin et al. evaluated a Bayesian algorithm for the 
prediction of bacteraemia in 19,303 patients, yielding an ROC- 
AUC of 0.70 [50]. In contrast to our study, however, laboratory 
markers included in the analysis were allowed a considerable lag 
time to blood culture sampling of up to 72 hours, or even 7 days in 
case of albumin and alkaline phosphatise. Considering the 
dynamic evolution of inflammation makers, this discrepancy in 
sampling times may have importantiy impacted on their results. 
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